Service labeling using semi-supervised learning

ABSTRACT

The disclosure provides an approach for workload labeling and identification of known or custom applications. Embodiments include determining a plurality of sets of features comprising a respective set of features for each respective workload of a first subset of a plurality of workloads. Embodiments include identifying a group of workloads based on similarities among the plurality of sets of features. Embodiments include receiving label data from a user comprising a label for the group of workloads. Embodiments include associating the label with each workload of the group of workloads to produce a training data set. Embodiments include using the training data set to train a model to output labels for input workloads. Embodiments include determining a label for a given workload of the plurality of workloads by inputting features of the given workload to the model.

RELATED APPLICATIONS

The present patent application is a continuation of, and hereby claimspriority under 35 U.S.C § 120 to pending U.S. patent application Ser.No. 16/855,305, entitled “SERVICE LABELING USING SEMI-SUPERVISEDLEARNING,” by the same inventors, filed on 22 Apr. 2020, the contents ofwhich are herein incorporated in their entirety by reference for allpurposes.

BACKGROUND

Software defined networking (SDN) comprises a plurality of hosts incommunication over a physical network infrastructure, each host havingone or more virtualized endpoints such as virtual machines (VMs),containers, or other virtual computing instances (VCIs) that areconnected to logical overlay networks that may span multiple hosts andare decoupled from the underlying physical network infrastructure.Though certain aspects are discussed herein with respect to VMs, itshould be noted that they may similarly be applicable to other suitableVCIs.

For example, any arbitrary set of VMs in a datacenter may be placed incommunication across a logical Layer 2 network by connecting them to alogical switch. Each logical switch corresponds to a virtual networkidentifier (VNI), meaning each logical Layer 2 network can be identifiedby a VNI. The logical switch is collectively implemented by at least onevirtual switch on each host that has a VM connected to the logicalswitch. The virtual switch on each host operates as a managed edgeswitch implemented in software by the hypervisor on each host.Forwarding tables at the virtual switches instruct the host toencapsulate packets, using a virtual tunnel endpoint (VTEP) forcommunication from a participating VM to another VM on the logicalnetwork but on a different (destination) host. The original packet fromthe VM is encapsulated at the VTEP with an outer IP header addressed tothe destination host using a mapping of VM IP addresses to host IPaddresses. At the destination host, a second VTEP decapsulates thepacket and then directs the packet to the destination VM. Logicalrouters extend the logical network across subnets or other networkboundaries using IP routing in the logical domain. The logical router iscollectively implemented by at least one virtual router on each host ora subset of hosts. Each virtual router operates as a router implementedin software by the hypervisor on the hosts.

SDN generally involves the use of a management plane (MP) and a controlplane (CP). The management plane is concerned with receiving networkconfiguration input from an administrator or orchestration automationand generating desired state data that specifies how the logical networkshould be implemented in the physical infrastructure. The managementplane may have access to a database application for storing the networkconfiguration input. The control plane is concerned with determining thelogical overlay network topology and maintaining information aboutnetwork entities such as logical switches, logical routers, endpoints,etc. The logical topology information specifying the desired state ofthe network is translated by the control plane into networkconfiguration data that is then communicated to network elements of eachhost. The network configuration data, for example, includes forwardingtable entries to populate forwarding tables at virtual switch(es)provided by the hypervisor (i.e., virtualization software) deployed oneach host. An example control plane logical network controller isdescribed in U.S. Pat. No. 9,525,647 entitled “Network Control Apparatusand Method for Creating and Modifying Logical Switching Elements,” whichis fully incorporated herein by reference.

The rapid growth of network virtualization has led to an increase inlarge scale SDN data centers. The scale of such data centers may be verylarge, often including hundreds of servers with each server hostinghundreds of VCIs. With such scale comes a need to be able to operatesuch topologies efficiently and securely. Techniques exist for applyingsecurity policies and providing other management functions for VCIsbased on labels associated with the VCIs. A given VCI may be labeled byan administrator and the label may be used for grouping, security policyenforcement (e.g., based on security groups), statistical analysis ofVCIs, and/or the like. For example, a given security policy may apply toall VCIs labeled with a first label. However, an administrator manuallyapplying labels to all VCIs can be a tedious and time-consuming process,particularly as numbers of VCIs in data centers continue to grow.Furthermore, manual labeling of large numbers of VCIs introduces risk oferrors. As such, there is a need in the art for improved techniques forlabeling VCIs.

SUMMARY

Embodiments provide a method of workload labeling. Embodiments include:determining a plurality of sets of features comprising a respective setof features for each respective workload of a first subset of aplurality of workloads; identifying a group of workloads based onsimilarities among the plurality of sets of features; receiving labeldata from a user comprising a label for the group of workloads;associating the label with each workload of the group of workloads toproduce a training data set; using the training data set to train amodel to output labels for input workloads; and determining a label fora given workload of the plurality of workloads by inputting features ofthe given workload to the model.

Further embodiments include a non-transitory computer-readable storagemedium storing instructions that, when executed by a computer system,cause the computer system to perform the method set forth above, and acomputer system programmed to carry out the method set forth above.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts example physical and virtual network components withwhich embodiments of the present disclosure may be implemented.

FIG. 2 depicts an example of features related to workload labelingaccording to embodiments of the present disclosure.

FIG. 3 depicts an example of grouping VCIs based on features accordingto embodiments of the present disclosure.

FIG. 4 depicts an example of receiving labels for grouped VCIs accordingto embodiments of the present disclosure.

FIG. 5 depicts an example of training a model for workload labelingaccording to embodiments of the present disclosure.

FIG. 6 depicts example operations for workload labeling according toembodiments of the present disclosure.

FIG. 7 depicts example operations for using workload labels according toembodiments of the present disclosure.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe figures. It is contemplated that elements disclosed in oneembodiment may be beneficially utilized on other embodiments withoutspecific recitation.

DETAILED DESCRIPTION

The present disclosure provides an approach for workload labeling andidentification of known and custom applications. In the art, a workloadgenerally refers to a computing task with a discrete set of applicationlogic that can be executed independently of any parent or relatedapplication logic. A workload may refer to a VCI in certain embodiments.In some embodiments, a workload may refer to an application, acontainer, or another discrete set of computing logic. While someexisting techniques involve an administrator manually applying labels toworkloads, these techniques can be inefficient and difficult,particularly as numbers of workloads increase. Machine learningtechniques can improve efficiency of workload labeling significantly,but generating training data for use in training a machine learningmodel for workload labeling can also become inefficient and difficult.

Machine learning models may be trained to associate certain input“features” with output “labels.” Training data for a machine learningmodel may include labeled sets of features. For example, a training datainstance may include a set of one or more features of a workloadassociated with a label applied to the workload by an administrator.Generally, the larger the training data set, the better the results ofthe trained model will be. As such, generating large amounts of trainingdata may involve manually labeling large numbers of workloads.

As such, embodiments of the present disclosure involve semi-supervisedlearning techniques that significantly reduce the amount of manual inputrequired while allowing large training data sets to be generated,resulting in improved results from trained models. In certainembodiments, a subset of all workloads in a networking environment isidentified for use in generating training data. The subset may, forexample, be a given percentage of all workloads.

One or more features of each workload in the subset are identified.Features of a workload may include, for example, one or more of networkports on which the workload listens/receives traffic when coupled to anetwork, network ports on which the workload connects to remoteprocesses (e.g., outside of the data center) via the network, processesrunning on the workload, remote processes to which the workloadconnects, numbers of connections to a process or port, and the like. Theworkloads in the subset are then grouped into one or more groups basedon feature similarity. In one example, cosine similarity betweenfeatures of workloads is used to group similar workloads in the subsetinto a group.

The one or more groups of workloads are then presented to a user, suchas an administrator, so that the user can provide one or more labels foreach group. In certain embodiments, the user provides one or more labelsfor a single workload in each group. The user may also, in someembodiments, review the groups and identify workloads that do not belongin certain groups. Once a group (e.g., all groups) has a workload thathas been assigned a label or labels, the label or labels for thatworkload may be applied to all workloads in the group. As such, the userdoes not need to manually label every workload in the subset, and mayonly provide a single label or set of labels for each group. Multiplelabels may be applied to a workload, such as if the workload runsmultiple applications.

The labeled groups of workloads are then used as training data for amodel. For example, each given workload in the subset may be used as atraining data instance comprising the one or more features of the givenworkload associated with the one or more labels assigned by the user tothe group to which the workload belongs.

The training data is then used to train the model. Training the modelmay involve iteratively adjusting parameters of the model based on thetraining data such that, for a given training data instance, providingthe features of the given training data instance as inputs to the modelresults in an output from the model that matches a label or labels ofthe training data instance.

The trained model is then used to determine labels for the rest of theworkloads that were not included in the subset, new workloads, etc.Features of each given workload are determined and then provided asinputs to the model, and the model outputs one or more labels for thegiven workload. For example, the model may determine scores for each ofa plurality of potential labels, the scores indicating a confidence ofwhether a given label should be applied to a given workload. If a scorefor a given label exceeds a threshold, for example, then the given labelmay be applied to the given workload. As such, techniques describedherein allow a potentially large number of workloads to be accuratelylabeled with minimal user input.

Once workloads are labeled, the labels may be used for a variety ofpurposes, such as defining security groups, applying security policies,statistical analysis, network segmentation, network monitoring,intrusion detection/prevention, user interface (UI) visualization,and/or the like. For example, the labels may allow workloads to beconveniently grouped in a UI and targeted for a variety of purposes,such as monitoring and the like. Grouping of workloads based on labelsdetermined according to techniques described herein may allow a UI to bede-cluttered and simplified. For example, filters may be applied toworkloads based on labels. If a user is not interested in core services,workloads labeled with core services can be filtered (e.g., and excludedfrom display within a UI).

FIG. 1 depicts example physical and virtual network components withwhich embodiments of the present disclosure may be implemented.

Networking environment 100 includes data center 130 connected to network110. Network 110 is generally representative of a network of computingentities such as a local area network (“LAN”) or a wide area network(“WAN”), a network of networks, such as the Internet, or any connectionover which data may be transmitted.

Data center 130 generally represents a set of networked computingentities, and may comprise a logical overlay network. Data center 130includes host(s) 105, a gateway 134, a data network 132, which may be aLayer 3 network, and a management network 126. Data network 132 andmanagement network 126 may be separate physical networks or differentvirtual local area networks (VLANs) on the same physical network.

Each of hosts 105 may be constructed on a server grade hardware platform106, such as an x86 architecture platform. For example, hosts 105 may begeographically co-located servers on the same rack or on differentracks. Host 105 is configured to provide a virtualization layer, alsoreferred to as a hypervisor 116, that abstracts processor, memory,storage, and networking resources of hardware platform 106 into multiplevirtual computing instances (VCIs) 135 ₁ to 135 _(n) (collectivelyreferred to as VCIs 135 and individually referred to as VCI 135) thatrun concurrently on the same host. VCIs 135 may include, for instance,VMs, containers, virtual appliances, and/or the like.

Hypervisor 116 may run in conjunction with an operating system (notshown) in host 105. In some embodiments, hypervisor 116 can be installedas system level software directly on hardware platform 106 of host 105(often referred to as “bare metal” installation) and be conceptuallyinterposed between the physical hardware and the guest operating systemsexecuting in the virtual machines. In certain aspects, hypervisor 116implements one or more logical entities, such as logical switches,routers, etc. as one or more virtual entities such as virtual switches,routers, etc. In some implementations, hypervisor 116 may comprisesystem level software as well as a “Domain 0” or “Root Partition”virtual machine (not shown) which is a privileged machine that hasaccess to the physical hardware resources of the host. In thisimplementation, one or more of a virtual switch, virtual router, virtualtunnel endpoint (VTEP), etc., along with hardware drivers, may reside inthe privileged virtual machine. Although aspects of the disclosure aredescribed with reference to VMs, the teachings herein also apply toother types of virtual computing instances (VCIs) or data compute nodes(DCNs), such as containers, which may be referred to as Dockercontainers, isolated user space instances, namespace containers, etc. Incertain embodiments, VCIs 135 may be replaced with containers that runon host 105 without the use of a hypervisor.

Gateway 134 provides VCIs 135 and other components in data center 130with connectivity to network 110, and is used to communicate withdestinations external to data center 130 (not shown). Gateway 134 may bea virtual computing instance, a physical device, or a software modulerunning within host 105.

Controller 136 generally represents a control plane that managesconfiguration of VCIs 135 within data center 130. Controller 136 may bea computer program that resides and executes in a central server in datacenter 130 or, alternatively, controller 136 may run as a virtualappliance (e.g., a VM) in one of hosts 105. Although shown as a singleunit, it should be understood that controller 136 may be implemented asa distributed or clustered system. That is, controller 136 may includemultiple servers or virtual computing instances that implementcontroller functions. Controller 136 is associated with one or morevirtual and/or physical CPUs (not shown). Processor(s) resourcesallotted or assigned to controller 136 may be unique to controller 136,or may be shared with other components of data center 130. Controller136 communicates with hosts 105 via management network 126.

Manager 138 represents a management plane comprising one or morecomputing devices responsible for receiving logical networkconfiguration inputs, such as from a network administrator, defining oneor more endpoints (e.g., VCIs and/or containers) and the connectionsbetween the endpoints, as well as rules governing communications betweenvarious endpoints. In one embodiment, manager 138 is a computer programthat executes in a central server in networking environment 100, oralternatively, manager 138 may run in a VM, e.g. in one of hosts 105.Manager 138 is configured to receive inputs from an administrator orother entity, e.g., via a web interface or API, and carry outadministrative tasks for data center 130, including centralized networkmanagement and providing an aggregated system view for a user.

Monitoring appliance 140 generally represents a component of data center130 that monitors attributes of workloads, such as VCIs, on hosts 105and performs labeling of workloads according to embodiments of thepresent disclosure. In one embodiment, monitoring appliance 140 is acomputer program that executes in a central server in networkingenvironment 100, or alternatively, monitoring appliance 140 may run inone or more VMs, e.g. in one or more of hosts 105. In one embodiment,monitoring appliance 140 is implemented in a distributed fashion acrossa plurality of VCIs on a plurality of hosts 105.

In some embodiments, monitoring appliance 140 communicates with an agenton each of hosts 105, such as agent 118 in hypervisor 116, in order toretrieve attributes of VCIs, such as VCIs 135. In some embodiments,attributes are retrieved by hypervisor 116 from endpoint monitoringcomponents (not shown) running on every VCI and/or from network flowdata, such as through a virtual switch, monitored by hypervisor 116 oneach host 105. Attributes may include, for example, network ports (e.g.,coupled to a virtual switch) on which a VCI listens for traffic, networkports on which a VCI connects to remote processes, processes running ona VCI, remote processes to which a VCI connects, numbers of connectionsto a process or port from a VCI, numbers of processes running on a VCI,command line parameters of a VCI, and/or the like. One or more of theattributes of VCIs are then used as one or more of the features of theVCIs in order to group VCIs based on similarity of features.

For example, monitoring appliance 140 may use cosine similarity betweenfeatures of VCIs in order to group similar VCIs of a subset of all VCIsin data center 130, as described in more detail below with respect toFIGS. 2 and 3 . The groups may be used for efficient labelling oftraining data by a user, as described in more detail below with respectto FIG. 4 . Labeled training data is then used to train a model tooutput one or more labels when features of a given VCI are input intothe model, as described in more detail below with respect to FIG. 5 .For instance, the model may be used to label one or more (e.g., all)VCIs in data center 130 other than the subset of VCIs that was used fortraining data.

Applying labels to VCIs allows them to be more effectively managed. Forexample, manager 138 or a separate security component, may applysecurity policies to VCIs based on labels indicative of the known and/orcustom services that are running on the VCIs. In one example, a givensecurity policy applies to all VCIs running a particular service, andlabels of such VCIs indicate services running on the VCIs. A knownservice generally refers to a commonly used service (e.g., Microsoft®Active Directory®), while a custom service generally refers to a servicethat is identified by a user.

In some embodiments, labeled workloads and their features are sharedbetween multiple data centers in order to improve training data for amodel that is used across the multiple data centers. This mayaccomplished, for example, using a service such as Amazon Web Services®Telemetry.

FIG. 2 depicts an example 200 of features related to workload labelingaccording to embodiments of the present disclosure. Example 200 includesfeatures of VCIs 135 ₁, 135 ₂, and 135 ₃ of FIG. 1 . For instance, thefeatures may have been collected by monitoring appliance 140 of FIG. 1through interaction with agent 118 of hypervisor 116 of FIG. 1 . Asshown each row corresponds to a VCI 135, and each column corresponds toa feature. A value of one in a given cell indicates that thecorresponding VCI 135 includes the corresponding feature. A value ofzero in a given cell indicates that the corresponding VCI 135 does notinclude the corresponding feature.

The features include whether each VCI is listening on port 8080 (yes forVCI 135 ₁ and VCI 135 ₃ and no for VCI 135 ₂), whether each VCI islistening on port 1433 (no for VCI 135 ₁ and VCI 135 ₃ and yes for VCI135 ₂), whether each VCI is connecting on port 80 (yes for VCI 135 ₁ andVCI 135 ₃ and no for VCI 135 ₂), whether each VCI runs local process P1(yes for VCI 135 ₁ and VCI 135 ₃ and no for VCI 135 ₂), whether each VCIconnects to a remote process P2 (yes for VCI 135 ₁ and VCI 135 ₃ and nofor VCI 135 ₂), and whether each VCI runs a local process P3 (no for VCI135 ₁ and VCI 135 ₃ and yes for VCI 135 ₂).

It is noted that while the features listed in example 200 are binaryfeatures, other features may not be binary. For example, another featuremay be a number of connections to a port or a process. Furthermore,example 200 only lists features of three VCIs for illustration purposes,but features of a larger number of VCIs may be determined.

FIG. 3 depicts an example 300 of grouping VCIs based on featuresaccording to embodiments of the present disclosure. In some embodiments,example 300 is an adjacency matrix. Example 300 involves grouping VCIsbased on features depicted in example 200 of FIG. 2 .

Example 300 illustrates a match score for each pair of VCIs. The matchscores may, for example, be calculated using cosine similarity betweenfeatures of each pair of VCIs in a subset of all VCIs in the datacenter. The match score may be a normalized value between 0 and 1, wherea higher match score indicates a closer match. In some embodiments,weights are applied to features when determining similarity between VCIs(e.g., as part of a cosine similarity calculation). For example,features related to commonly used ports and registered ports may beweighted higher than ephemeral ports. In some examples, features relatedto a given port are weighted based on numbers of connections to thegiven port (e.g., both as a source and destination). In certainembodiments, weights may be incorporated into feature determination foreach workload (e.g., based on the workload's activity and role in thenetwork topology), and each feature may be weighted prior to the cosinesimilarity calculation. Features may, for example, be normalized basedon a scale of enterprise networks and/or based on activities of givenworkloads (e.g., numbers of connections, etc.). For example, activitiesof workloads may be monitored for a time to determine numbers ofconnections and the like, and the monitored information may be used tonormalize features prior to calculating cosine similarities.

The match score between VCI 135 ₁ and VCI 135 ₂ is 0, the match scorebetween VCI 135 ₁ and VCI 135 ₃ is 1, and the match score between VCI135 ₂ and VCI 135 ₃ is 0. Because the match score between VCI 135 ₁ andVCI 135 ₃ is 1, VCI 135 ₁ and VCI 135 ₃ are grouped together. In otherembodiments, some VCIs may have a match score that falls somewherebetween 0 and 1, and a match may be determined if the match scoreexceeds a threshold.

FIG. 4 depicts an example 400 of receiving labels for grouped VCIsaccording to embodiments of the present disclosure. In example 400, VCIs135 ₁, 135 ₃, and 135 ₈ have been grouped together as Group 1 and VCIs135 ₂, 135 ₄, 135 ₇, and 135 ₁₀ have been grouped together as Group 2.Group 1 and Group 2 may have been determined using cosine similaritybetween features of VCIs, as described above with respect to FIGS. 2 and3 .

A label is received for each group indicating a service running on theVCIs in the group. Group 1 is labeled “Active Directory,” indicatingthat the VCIs in Group 1 run Microsoft® Active Directory® Services, andGroup 2 is labeled “Exchange,” indicating that the VCIs in Group 2 runMicrosoft® Exchange services.

In some embodiments, a user provides the labels via a user interface.For instance, the grouped VCIs may be displayed for review and/orlabeling. At least a subset of the features of each VCI may be displayedfor review. In some embodiments, features considered most important to aworkload's inclusion in a group are displayed, such as the features thatwere most similar to other workloads in the group. In some embodiments,features considered to be less significant, such as features related toephemeral ports, are not displayed. Ephemeral ports (e.g., short-livedtransport protocol ports for internet protocol communications that areautomatically allocated from a predefined range) may be considered lesssignificant because they can be randomly chosen by an application andchange dynamically. Furthermore, derived features may not be displayed,as they may not be directly interpretable without reference tounderlying attributes of a workload. A derived feature is a featuredetermined based on another feature, such as a dimensionality reductionof a matrix (e.g., principal component analysis, singular valuedecomposition, matrix factorization, and the like), learned embeddings(e.g., deepwalk, node2vec, and the like), log normalization, graphcentrality, and the like.

In some embodiments, displaying the features considered most importantto a workload's inclusion in a group increases the explainability oftechniques described herein. Informing users of the reasons forgroupings allows automated grouping processes to be understood andverified.

The user interface allows a user to efficiently label other VCIs in agroup by applying a single label (e.g., “Active Directory”). In someembodiments, the user may determine that one or more VCIs do not belongin a given group, and may provide feedback indicating that the VCIshould be removed from the group or may assign a label to the individualVCI that is different than the label applied to the group. If a VCI isremoved from a group, it may be added to another group (e.g., the groupwith which the VCI has a next highest cosine similarity), or may remainungrouped. Ungrouped VCIs, such as VCIs that are not similar to anyother VCIs or VCIs that have been removed from a group, may also bedisplayed for labeling, or may be removed from the subset used fortraining data. In certain embodiments, any VCIs not included in thetraining data may be labeled using the trained model, as described inmore detail below with respect to FIG. 5 .

The label or labels received from the user for a given group are appliedto other VCIs in the given group. Then the labeled VCIs from the groupsare used as training data for a model, as described in more detail belowwith respect to FIG. 5 .

FIG. 5 depicts an example 500 of training a model for workload labelingaccording to embodiments of the present disclosure.

In example 500, model 520 is trained using labeled training data 510.Labeled training data 510 generally represents a plurality of trainingdata instances, each training data instance including features of agiven workload associated with a label. For example, labeled trainingdata 510 may include VCIs that were labeled in groups as described abovewith respect to FIG. 4 .

Model 520 may, for example, be a tree-based machine learning model, suchas a random forest or gradient boosting machine (e.g., XGBoost orLightGBM) model, or a linear regression model, such as an elastic net,ridge, or lasso regression model. In some embodiments, model 520 is aneural network. In certain embodiments, model 520 may be any type ofclassification model. Techniques for training model 520 generallyinvolve iteratively adjusting model parameters until outputs from model520 in response to input features from labeled training data 510 matchthe labels for the input features in labeled training data 510.

Once trained, model 520 is able to output a label for a set of inputfeatures. For example, feature set 530 is provided as input to model520. Feature set 530 represents features of a VCI 135 _(i) of FIG. 1 .Feature set 530 indicates that VCI 135 _(i) is listening on port 8080,not listening on port 1433, connecting on port 80, runs local processP1, connects to remote process P2, and does not run local process P3.

In response to feature set 530, model 520 outputs label 540, whichincludes a service label of “Active Directory.” It is noted that model520 may also output additional labels. For example, model 520 maydetermine confidence scores for each of a plurality of labels, and theconfidence scores may be used to determine whether each given labelshould be applied to feature set 530 (e.g., based on whether theconfidence score for a given label exceeds a threshold). Label 540 isapplied to VCI 135 _(i), and may be used for various purposes, such asapplying security policies to VCI 135 _(i). In some embodiments model520 is used to determine labels for all VCIs running in data center 130of FIG. 1 other than the subset of VCIs that were grouped and used forlabeled training data 510.

Labels output by model 520 may, in some instances, be used to retrainmodel 520. For example, if label 540 is approved by a user (e.g., if anadministrator confirms that label 540 is accurate), then label 540 maybe used to generate an additional training data instance for labeledtraining data 510 that is used to retrain model 520. The additionaltraining data instance may, for example, include feature set 530associated with label 540. User feedback may alternatively indicate thatlabel 540 is incorrect. In some cases, a user may provide an alternativelabel for feature set 530, and the alternative label may be used for atraining data instance of labeled training data 510 for retraining themodel. As such, model 520 may be dynamically retrained over time forimproved accuracy.

FIG. 6 depicts example operations 600 for workload labeling according toembodiments of the present disclosure. For example, operations 600 maybe performed by monitoring appliance 140 of FIG. 1 .

Operations 600 begin with step 602, where a plurality of sets offeatures comprising a respective set of features for each respectiveworkload of a first subset of a plurality of workloads is determined.For example, monitoring appliance 140 of FIG. 1 may determine the setsof features through interaction with agent 118 of hypervisor 116 on eachof hosts 105 of FIG. 1 .

Operations 600 continue with step 604, where a group of workloads isidentified based on similarities among the plurality of sets offeatures. For example, monitoring appliance 140 of FIG. 1 may comparefeatures (e.g., using cosine similarity) of the workloads to groupsimilar workloads.

Operations 600 continue with step 606, where label data is received froma user comprising a label for the group of workloads. In someembodiments, the user provides the label via a user interface thatdisplays the grouped workloads along with features of the groupedworkloads.

Operations 600 continue with step 608, where the label is associatedwith each workload of the group of workloads to produce a training dataset. In some embodiments, the training data set comprises a plurality oftraining data instances, each training data instance including featuresof a given workload and a label associated with the given workload.

Operations 600 continue with step 610, where the training data set isused to train a model to output labels for input workloads. Forinstance, monitoring appliance 140 of FIG. 1 may train the model asdescribed above with respect to example 500 of FIG. 5 .

Operations 600 continue with step 612, where a label is determined for agiven workload of the plurality of workloads by inputting features ofthe given workload to the model. For example, monitoring appliance 140of FIG. 1 may input the features of the given workload to the model andreceive a label as an output as described above with respect to example500 of FIG. 5 .

In some embodiments, the respective set of features for each respectiveworkload of the first subset of the plurality of workloads comprises oneor more of: the respective workload does or does not listen on a givenport; the respective workload does or does not connect to a given port;the respective workload does or does not run a given local process; therespective workload does or does not connect to a given remote process;a number of connections between the respective workload and a particularport; or a number of local or remote processes for the respectiveworkload. In some embodiments, certain features of a given workload maybe derived from other features of the given workload, as describedabove.

In some embodiments, identifying the group of workloads based onsimilarities among the plurality of sets of features comprisescalculating cosine similarity among the plurality of sets of features.

In certain embodiments, the label data from the user is received via auser interface in response to displaying a subset of features ofworkloads in the group of workloads in the user interface.

In some embodiments, input is received from the user indicating that acertain workload should be removed from the group of workloads.

In certain embodiments, the model comprises a tree-based model or alinear regression model. In some embodiments, the model may comprise atype of classification model.

In some embodiments, operations 600 further include performing an actionwith respect to the given workload based on the label for the givenworkload, wherein the action comprises one or more of: adding the givenworkload to a security group; applying a security policy to the givenworkload; performing network segregation involving the given workload;or performing intrusion detection or prevention for the given workload.

FIG. 7 depicts additional example operations 700 related to workloadlabeling according to embodiments of the present disclosure. Forexample, operations 700 may be performed by monitoring appliance 140 ofFIG. 1 , manager 138 of FIG. 1 , and/or another component, such as asecurity component.

Operations 700 begin with step 702, where a label for a given workloadis retrieved. For example, a label determined using a model according totechniques described herein may be associated with the given workload,and may be retrieved.

Operations 700 continue with step 704, where an action is performedusing the given workload based on the label for the given workload. Theaction may be, for instance, applying a security policy to the givenworkload based on the label, performing statistical analysis of thegiven workload using the label (e.g., generating statistics related toperformance of all workloads with the label), performingmicrosegmentation of workloads, including the given workload, based onthe label, and the like.

The various embodiments described herein may employ variouscomputer-implemented operations involving data stored in computersystems. For example, these operations may require physical manipulationof physical quantities—usually, though not necessarily, these quantitiesmay take the form of electrical or magnetic signals, where they orrepresentations of them are capable of being stored, transferred,combined, compared, or otherwise manipulated. Further, suchmanipulations are often referred to in terms, such as producing,identifying, determining, or comparing. Any operations described hereinthat form part of one or more embodiments of the invention may be usefulmachine operations. In addition, one or more embodiments of theinvention also relate to a device or an apparatus for performing theseoperations. The apparatus may be specially constructed for specificrequired purposes, or it may be a general purpose computer selectivelyactivated or configured by a computer program stored in the computer. Inparticular, various general purpose machines may be used with computerprograms written in accordance with the teachings herein, or it may bemore convenient to construct a more specialized apparatus to perform therequired operations.

The various embodiments described herein may be practiced with othercomputer system configurations including hand-held devices,microprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers, and the like.

One or more embodiments of the present invention may be implemented asone or more computer programs or as one or more computer program modulesembodied in one or more computer readable media. The term computerreadable medium refers to any data storage device that can store datawhich can thereafter be input to a computer system—computer readablemedia may be based on any existing or subsequently developed technologyfor embodying computer programs in a manner that enables them to be readby a computer. Examples of a computer readable medium include a harddrive, network attached storage (NAS), read-only memory, random-accessmemory (e.g., a flash memory device), a CD (Compact Discs) —CD-ROM, aCD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, andother optical and non-optical data storage devices. The computerreadable medium can also be distributed over a network coupled computersystem so that the computer readable code is stored and executed in adistributed fashion.

Although one or more embodiments of the present invention have beendescribed in some detail for clarity of understanding, it will beapparent that certain changes and modifications may be made within thescope of the claims. Accordingly, the described embodiments are to beconsidered as illustrative and not restrictive, and the scope of theclaims is not to be limited to details given herein, but may be modifiedwithin the scope and equivalents of the claims. In the claims, elementsand/or steps do not imply any particular order of operation, unlessexplicitly stated in the claims.

Virtualization systems in accordance with the various embodiments may beimplemented as hosted embodiments, non-hosted embodiments or asembodiments that tend to blur distinctions between the two, are allenvisioned. Furthermore, various virtualization operations may be whollyor partially implemented in hardware. For example, a hardwareimplementation may employ a look-up table for modification of storageaccess requests to secure non-disk data.

Certain embodiments as described above involve a hardware abstractionlayer on top of a host computer. The hardware abstraction layer allowsmultiple contexts to share the hardware resource. In one embodiment,these contexts are isolated from each other, each having at least a userapplication running therein. The hardware abstraction layer thusprovides benefits of resource isolation and allocation among thecontexts. In the foregoing embodiments, virtual machines are used as anexample for the contexts and hypervisors as an example for the hardwareabstraction layer. As described above, each virtual machine includes aguest operating system in which at least one application runs. It shouldbe noted that these embodiments may also apply to other examples ofcontexts, such as containers not including a guest operating system,referred to herein as “OS-less containers” (see, e.g., www.docker.com).OS-less containers implement operating system—level virtualization,wherein an abstraction layer is provided on top of the kernel of anoperating system on a host computer. The abstraction layer supportsmultiple OS-less containers each including an application and itsdependencies. Each OS-less container runs as an isolated process inuserspace on the host operating system and shares the kernel with othercontainers. The OS-less container relies on the kernel's functionalityto make use of resource isolation (CPU, memory, block I/O, network,etc.) and separate namespaces and to completely isolate theapplication's view of the operating environments. By using OS-lesscontainers, resources can be isolated, services restricted, andprocesses provisioned to have a private view of the operating systemwith their own process ID space, file system structure, and networkinterfaces. Multiple containers can share the same kernel, but eachcontainer can be constrained to only use a defined amount of resourcessuch as CPU, memory and I/O. The term “virtualized computing instance”as used herein is meant to encompass both VMs and OS-less containers.

Many variations, modifications, additions, and improvements arepossible, regardless the degree of virtualization. The virtualizationsoftware can therefore include components of a host, console, or guestoperating system that performs virtualization functions. Pluralinstances may be provided for components, operations or structuresdescribed herein as a single instance. Boundaries between variouscomponents, operations and data stores are somewhat arbitrary, andparticular operations are illustrated in the context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within the scope of the invention(s). Ingeneral, structures and functionality presented as separate componentsin exemplary configurations may be implemented as a combined structureor component. Similarly, structures and functionality presented as asingle component may be implemented as separate components. These andother variations, modifications, additions, and improvements may fallwithin the scope of the appended claim(s).

We claim:
 1. A method of managing workloads, comprising: from aplurality of workloads each comprising a virtual computing instance(VCI) running a plurality of processes, identifying a group of workloadsthat have similar respective values associated with a set of respectivefeatures; displaying information about one of the workloads via a userinterface, wherein the information comprises one or more of therespective values associated with the set of respective features;receiving a label via the user interface in response to the displayingof the information about the one of the workloads; associating the labelwith each workload of the group of workloads; producing a training dataset comprising the respective values associated with the respectivefeatures of each workload in the group of workloads and the associatedlabel; training a machine learning model using the training data set;generating, using the machine learning model, a predicted label for anew workload of the plurality of workloads by inputting valuesassociated with the respective features of the new workload to themachine learning model, wherein the new workload is not a member of thegroup of workloads; assigning the predicted label to the new workload;and performing, based on the assigned predicted label, one or more of:generating a visualization; performing statistical analysis with respectto a plurality of workloads that are associated with the assignedpredicted label; or segmenting a network.
 2. The method of claim 1,wherein the set of features comprises one or more of: a number ofconnections between the respective workload and a particular port; anumber of local or remote processes for the respective workload; therespective workload does or does not listen on a given port; therespective workload does or does not connect to a given port; therespective workload does or does not run a given local process; or therespective workload does or does not connect to a given remote process.3. The method of claim 1, wherein identifying the group of workloadsthat have similar respective values associated with the set ofrespective features comprises calculating cosine similarity among aplurality of sets of features.
 4. The method of claim 1, furthercomprising receiving input via the user interface indicating that acertain workload should be removed from the group of workloads.
 5. Themethod of claim 1, wherein the machine learning model comprises aclassification model, a tree-based model, or a linear regression model.6. The method of claim 1, further comprising adding the new workload toa security group based on the predicted label for the new workload. 7.The method of claim 1, further comprising re-training the machinelearning model based on the predicted label for the new workload.
 8. Themethod of claim 1, wherein the generating of the visualization comprisesgrouping, within the visualization, the plurality of workloads that areassociated with the assigned predicted label.
 9. The method of claim 1,wherein the performing of the statistical analysis with respect to theplurality of workloads that are associated with the assigned predictedlabel comprises generating statistics related to performance of theplurality of workloads that are associated with the assigned predictedlabel.
 10. A system for training a machine learning model, comprising:at least one memory; and at least one processor coupled to the at leastone memory, the at least one processor and the at least one memoryconfigured to: from a plurality of workloads each comprising a virtualcomputing instance (VCI) running a plurality of processes, identify agroup of workloads that have similar respective values associated with aset of respective features; display information about one of theworkloads via a user interface, wherein the information comprises one ormore of the respective values associated with the set of respectivefeatures; receive a label via the user interface in response to thedisplaying of the information about the one of the workloads; associatethe label with each workload of the group of workloads; produce atraining data set comprising the respective values associated with therespective features of each workload in the group of workloads and theassociated label; train a machine learning model using the training dataset; generate, using the machine learning model, a predicted label for anew workload of the plurality of workloads by inputting valuesassociated with the respective features of the new workload to themachine learning model, wherein the new workload is not a member of thegroup of workloads; assign the predicted label to the new workload; andperform, based on the assigned predicted label, one or more of:generating a visualization; performing statistical analysis with respectto a plurality of workloads that are associated with the assignedpredicted label; or segmenting a network.
 11. The system of claim 10,wherein the set of features comprises one or more of: a number ofconnections between the respective workload and a particular port; anumber of local or remote processes for the respective workload; therespective workload does or does not listen on a given port; therespective workload does or does not connect to a given port; therespective workload does or does not run a given local process; or therespective workload does or does not connect to a given remote process.12. The system of claim 10, wherein identifying the group of workloadsthat have similar respective values associated with the set ofrespective features comprises calculating cosine similarity among aplurality of sets of features.
 13. The system of claim 10, wherein theat least one processor and the at least one memory are furtherconfigured to receive input via the user interface indicating that acertain workload should be removed from the group of workloads.
 14. Thesystem of claim 10, wherein the machine learning model comprises aclassification model, a tree-based model, or a linear regression model.15. The system of claim 10, wherein the at least one processor and theat least one memory are further configured to add the new workload to asecurity group based on the predicted label for the new workload. 16.The system of claim 10, wherein the at least one processor and the atleast one memory are further configured to re-train the machine learningmodel based on the predicted label for the new workload.
 17. The systemof claim 10, wherein the generating of the visualization comprisesgrouping, within the visualization, the plurality of workloads that areassociated with the assigned predicted label.
 18. The system of claim10, wherein the performing of the statistical analysis with respect tothe plurality of workloads that are associated with the assignedpredicted label comprises generating statistics related to performanceof the plurality of workloads that are associated with the assignedpredicted label.
 19. A non-transitory computer-readable medium storinginstructions that, when executed by one or more processors, cause theone or more processors to: from a plurality of workloads each comprisinga virtual computing instance (VCI) running a plurality of processes,identify a group of workloads that have similar respective valuesassociated with a set of respective features; display information aboutone of the workloads via a user interface, wherein the informationcomprises one or more of the respective values associated with the setof respective features; receive a label via the user interface inresponse to the displaying of the information about the one of theworkloads; associate the label with each workload of the group ofworkloads; produce a training data set comprising the respective valuesassociated with the respective features of each workload in the group ofworkloads and the associated label; train a machine learning model usingthe training data set; generate, using the machine learning model, apredicted label for a new workload of the plurality of workloads byinputting values associated with the respective features of the newworkload to the machine learning model, wherein the new workload is nota member of the group of workloads; assign the predicted label to thenew workload; and perform, based on the assigned predicted label, one ormore of: generating a visualization; performing statistical analysiswith respect to a plurality of workloads that are associated with theassigned predicted label; or segmenting a network.
 20. Thenon-transitory computer-readable medium of claim 19, wherein thegenerating of the visualization comprises grouping, within thevisualization, the plurality of workloads that are associated with theassigned predicted label.
 21. The non-transitory computer-readablemedium of claim 19, wherein the performing of the statistical analysiswith respect to the plurality of workloads that are associated with theassigned predicted label comprises generating statistics related toperformance of the plurality of workloads that are associated with theassigned predicted label.